Non-Stationary Bandit Strategy for Rate Adaptation With Delayed Feedback

نویسندگان

چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multiarmed Bandit Problems with Delayed Feedback

In this paper we initiate the study of optimization of bandit type problems in scenarios where the feedback of a play is not immediately known. This arises naturally in allocation problems which have been studied extensively in the literature, albeit in the absence of delays in the feedback. We study this problem in the Bayesian setting. In presence of delays, no solution with provable guarante...

متن کامل

Stochastic Multi-Armed-Bandit Problem with Non-stationary Rewards

In a multi-armed bandit (MAB) problem a gambler needs to choose at each round of play one of K arms, each characterized by an unknown reward distribution. Reward realizations are only observed when an arm is selected, and the gambler’s objective is to maximize his cumulative expected earnings over some given horizon of play T . To do this, the gambler needs to acquire information about arms (ex...

متن کامل

On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems

A challenging variant of the MABP is the non-stationary bandit problem where the gambler must decide which arm to play while facing the possibility of a changing environment. In this paper, we consider the situation where the distributions of rewards remain constant over epochs and change at unknown time instants. We analyze two algorithms: the discounted UCB and the sliding-window UCB. We esta...

متن کامل

Delayed feedback during sensorimotor learning selectively disrupts adaptation but not strategy use.

In sensorimotor adaptation tasks, feedback delays can cause significant reductions in the rate of learning. This constraint is puzzling given that many skilled behaviors have inherently long delays (e.g., hitting a golf ball). One difference in these task domains is that adaptation is primarily driven by error-based feedback, whereas skilled performance may also rely to a large extent on outcom...

متن کامل

Broadcast Channels with Delayed Finite-Rate Feedback: Predict or Observe?

Most multiuser precoding techniques require accurate transmitter channel state information (CSIT) to maintain orthogonality between the users. Such techniques have proven quite fragile in time-varying channels because the CSIT is inherently imperfect due to estimation and feedback delay, as well quantization noise. An alternative approach recently proposed by Maddah-Ali and Tse (MAT) allows for...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Access

سال: 2020

ISSN: 2169-3536

DOI: 10.1109/access.2020.2988671